Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-13149: fix NPE for record==null when handling a produce request #11080

Merged
merged 3 commits into from
Sep 14, 2021

Conversation

ccding
Copy link
Contributor

@ccding ccding commented Jul 19, 2021

This code

int sizeOfBodyInBytes = ByteUtils.readVarint(buffer);
if (buffer.remaining() < sizeOfBodyInBytes)
return null;

returns record=null, and can subsequently cause a null pointer exception in
if (!record.hasMagic(batch.magic)) {

This PR lets the broker throw an invalid record exception and notify clients. The fix is similar to

int numHeaders = ByteUtils.readVarint(buffer);
if (numHeaders < 0)
throw new InvalidRecordException("Found invalid number of record headers " + numHeaders);
final Header[] headers;
if (numHeaders == 0)
headers = Record.EMPTY_HEADERS;
else
headers = readHeaders(buffer, numHeaders);
// validate whether we have read all header bytes in the current record
if (buffer.position() - recordStart != sizeOfBodyInBytes)
throw new InvalidRecordException("Invalid record size: expected to read " + sizeOfBodyInBytes +
" bytes in record payload, but instead read " + (buffer.position() - recordStart));
return new DefaultRecord(sizeInBytes, attributes, offset, timestamp, sequence, key, value, headers);
} catch (BufferUnderflowException | IllegalArgumentException e) {
throw new InvalidRecordException("Found invalid record structure", e);
}

where we throw an invalid record exception when the record's integrity is broken.

@ccding ccding changed the title [WIP] fix NPE when record==null in append fix NPE when record==null in append Jul 19, 2021
@ccding
Copy link
Contributor Author

ccding commented Jul 19, 2021

This PR is ready for review

return null;
throw new InvalidRecordException("Invalid record size: expected " + sizeOfBodyInBytes +
" bytes in record payload, but instead the buffer has only " + buffer.remaining() +
" remaining bytes.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really an exceptional case? Don't we do reads where we don't know exactly where the read ends and hence will trigger this path?

Copy link
Contributor Author

@ccding ccding Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the case that we are yet to complete reading the request? I didn't see a retry path, but it will cause a null point exception at

if (!record.hasMagic(batch.magic)) {

What do you suggest I do here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent here was to cover the case where an incomplete record is returned by the broker. However, we have broker logic to try and avoid this case since KIP-74:

} else if (!hardMaxBytesLimit && readInfo.fetchedData.firstEntryIncomplete) {
            // For FetchRequest version 3, we replace incomplete message sets with an empty one as consumers can make
            // progress in such cases and don't need to report a `RecordTooLargeException`
            FetchDataInfo(readInfo.fetchedData.fetchOffsetMetadata, MemoryRecords.EMPTY)

@hachikuji Do you remember if there is still a reason to return null here instead of the exception @ccding is proposing?

Copy link
Contributor Author

@ccding ccding Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case where an incomplete record is returned by the broker

I am referring to the produce API for the null pointer exception. The record is from a producer. The InvalidRecordException will trigger a response to the producer.

If the fetch path requires a different return value, I guess the problem becomes more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand you're talking about the producer case. I am talking about the fetch case. As I said, I think we may not need that special logic anymore, but @hachikuji would know for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji do you have time to have a look at this?

Copy link

@hachikuji hachikuji Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay here. I don't see a problem with the change. I believe that @ijuma is right that the fetch response may still return incomplete data, but I think this is handled in ByteBufferLogInputStream. We stop batch iteration early if there is incomplete data, so we would never reach the readFrom here which is called for each record in the batch. It's worth noting also that the only caller of this method (in DefaultRecordBatch.uncompressedIterator) has the following logic:

try {
  return DefaultRecord.readFrom(buffer, baseOffset, firstTimestamp, baseSequence, logAppendTime);
} catch (BufferUnderflowException e) {
  throw new InvalidRecordException("Incorrect declared batch size, premature EOF reached");
}

So it is already handle underflows in a similar way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking @hachikuji.

@ccding ccding changed the title fix NPE when record==null in append KAFKA-13149: fix NPE for record==null when handling a produce request Jul 29, 2021
Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hachikuji
Copy link

@ccding I kicked off a new build since it has been a while since the PR was submitted. Assuming tests are ok, I will merge shortly. Thanks for your patience.

@ccding
Copy link
Contributor Author

ccding commented Sep 14, 2021

The tests Jason kicked in failed two tests:

Build / JDK 8 and Scala 2.12 / testDescribeTopicsWithIds() – kafka.api.PlaintextAdminIntegrationTest
Build / JDK 11 and Scala 2.13 / shouldQueryStoresAfterAddingAndRemovingStreamThread – org.apache.kafka.streams.integration.StoreQueryIntegrationTest

both worked on my local run with merging trunk to this branch.

Pushing the trunk merge to this branch and let Jenkins to run it again.

@hachikuji hachikuji merged commit 75795d1 into apache:trunk Sep 14, 2021
hachikuji pushed a commit that referenced this pull request Sep 14, 2021
…equests (#11080)

Raise `InvalidRecordException` from `DefaultRecordBatch.readFrom` instead of returning null if there are not enough bytes remaining to read the record. This ensures that the broker can raise a useful exception for malformed record batches.

Reviewers: Ismael Juma <[email protected]>, Jason Gustafson <[email protected]>
@ccding ccding deleted the ak-record-null branch September 16, 2021 14:22
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
…equests (apache#11080)

Raise `InvalidRecordException` from `DefaultRecordBatch.readFrom` instead of returning null if there are not enough bytes remaining to read the record. This ensures that the broker can raise a useful exception for malformed record batches.

Reviewers: Ismael Juma <[email protected]>, Jason Gustafson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants